Beyond Sequential Covering - Boosted Decision Rules

نویسندگان

  • Krzysztof Dembczynski
  • Wojciech Kotlowski
  • Roman Slowinski
چکیده

From the beginning of machine learning, rule induction has been regarded as one of the most important issues in this research area. One of the first rule induction algorithms was AQ introduced by Michalski in early 80’s. AQ, as well as several other well-known algorithms, such as CN2 and Ripper, are all based on sequential covering. With the advancement of machine learning, some new techniques based on statistical learning were introduced. One of them, called boosting, or forward stagewise additive modeling, is a general induction procedure which appeared to be particularly efficient in binary classification and regression tasks. When boosting is applied to induction of decision rules, it can be treated as generalization of sequential covering, because it approximates the solution of the prediction task by sequentially adding new rules to the ensemble without adjusting those that have already entered the ensemble. Each rule is fitted by concentrating on examples which were the hardest to classify correctly by the rules already present in the ensemble. In this paper, we present a general scheme for learning an ensemble of decision rules in a boosting framework, using different loss functions and minimization techniques. This scheme, called ENDER, is covered by such algorithms as SLIPPER, LRI and MLRules. A computational experiment compares these algorithms on benchmark data.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

VC-DomLEM: Rule induction algorithm for variable consistency rough set approaches

We present a general rule induction algorithm based on sequential covering, suitable for variable consistency rough set approaches. This algorithm, called VC-DomLEM, can be used for both ordered and non-ordered data. In the case of ordered data, the rough set model employs dominance relation, and in the case of non-ordered data, it employs indiscernibility relation. VC-DomLEM generates a minima...

متن کامل

Boosted SVM for extracting rules from imbalanced data in application to prediction of the post-operative life expectancy in the lung cancer patients

In this paper, we present boosted SVM dedicated to solve imbalanced data problems. Proposed solution combines the benefits of using ensemble classifiers for uneven data together with cost-sensitive support vectors machines. Further, we present oracle-based approach for extracting decision rules from the boosted SVM. In the next step we examine the quality of the proposed method by comparing the...

متن کامل

Sequential Optimization of γ-Decision Rules

The paper is devoted to the study of an extension of dynamic programming approach which allows sequential optimization of approximate decision rules relative to length, coverage and number of misclassifications. Presented algorithm constructs a directed acyclic graph ∆γ(T ) which nodes are subtables of the decision table T . Based on the graph ∆γ(T ) we can describe all irredundant γ-decision r...

متن کامل

A heuristic covering algorithm has higher predictive accuracy than learning all rules

The induction of classification rules has been dominated by a single generic technique—the covering algorithm. This approach employs a simple hill-climbing search to learn sets of rules. Such search is subject to numerous widely known deficiencies. Further, there is a growing body of evidence that learning redundant sets of rules can improve predictive accuracy. The ultimate end-point of a move...

متن کامل

Optimal sequential procedures with Bayes decision rules

In this article, a general problem of sequential statistical inference for general discrete-time stochastic processes is considered. The problem is to minimize an average sample number given that Bayesian risk due to incorrect decision does not exceed some given bound. We characterize the form of optimal sequential stopping rules in this problem. In particular, we have a characterization of the...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010